Purpose:

Examine perinatal periods of risk and examine the spatial variation in feto-infant morality in Fulton and Dekalb counties in Georgia.

Objectives:

1. Use a spatial data class in R, ppp, which is necessary for executing the kernel estimation functions

2. Use kernel density estimation of spatial point processes, including selection of fixed bandwidths, and use of adaptive bandwidths

3. Estimate spatial relative risk surfaces, including estimation of tolerance contours

4. Estimate a statistically-optimal fixed bandwidth and explore adaptive bandwidths for use with the gwss() function

5. Calculate local, spatially-weighted mean, median, SD, and IQR for four census-tract level continuous measures using kernel density functions

6. Using Monte Carlo simulation to produces significance contours on our estimates of local, spatially- weighted summary statistics

7. Calculate local, spatially-weighted bivariate statistics summarizing how the correlations (Pearson and Spearman) of pairs of variables varies through space

Data Preparation:

library(tidyverse)
library(sp)
library(sf)
library(tmap)
library(spatstat)
library(sparr)
library(maptools)
library(raster)
library(GWmodel)


# This is points for births in Dekalb/Fulton county
b_point <- st_read('birth_points.gpkg')
## Reading layer `birth_points' from data source `/Users/brookelappe/Desktop/EPI 563/birth_points.gpkg' using driver `GPKG'
## Simple feature collection with 94373 features and 0 fields
## geometry type:  POINT
## dimension:      XY
## bbox:           xmin: 1029894 ymin: 1222298 xmax: 1098234 ymax: 1301535
## projected CRS:  NAD83 / Conus Albers
# This is points for deaths in Dekalb/Fulton county
d_point <- st_read('death_points.gpkg')
## Reading layer `death_points' from data source `/Users/brookelappe/Desktop/EPI 563/death_points.gpkg' using driver `GPKG'
## Simple feature collection with 701 features and 0 fields
## geometry type:  POINT
## dimension:      XY
## bbox:           xmin: 1040557 ymin: 1224819 xmax: 1095361 ymax: 1294009
## projected CRS:  NAD83 / Conus Albers
# This is an outline of Dekalb/Fulton county to be used as a study 'window'
county <- st_read('DekalbFultonWindow.gpkg') %>% 
  as('Spatial')
## Reading layer `DekalbFultonWindow' from data source `/Users/brookelappe/Desktop/EPI 563/DekalbFultonWindow.gpkg' using driver `GPKG'
## Simple feature collection with 1 feature and 2 fields
## geometry type:  POLYGON
## dimension:      XY
## bbox:           xmin: 1026366 ymin: 1220671 xmax: 1098719 ymax: 1302019
## projected CRS:  NAD83 / Conus Albers
# This is Dekalb/Fulton census tracts
atl <- st_read('Fulton-Dekalb-covariates.gpkg') %>%
  as('Spatial') # convert to sp class
## Reading layer `Fulton-Dekalb-covariates' from data source `/Users/brookelappe/Desktop/EPI 563/Fulton-Dekalb-covariates.gpkg' using driver `GPKG'
## Simple feature collection with 345 features and 6 fields
## geometry type:  MULTIPOLYGON
## dimension:      XY
## bbox:           xmin: 1026366 ymin: -386115.2 xmax: 1098719 ymax: -304767
## projected CRS:  NAD83 / Conus Albers

Briefly examine the data sets:

summary(b_point)
##             geom      
##  POINT        :94373  
##  epsg:5070    :    0  
##  +proj=aea ...:    0
summary(d_point)
##             geom    
##  POINT        :701  
##  epsg:5070    :  0  
##  +proj=aea ...:  0

Introducing kernel density estimation (KDE).

KDE is a non-parametric tool that can estimate the varying spatial intensity of a spatial point process (spp) and estimate a spatial relative risk surface representing spatial variation in event occurance conditional on the population at risk.

Create the planar point process (ppp) data object using list of x, y coordinates for event points and definition for spatial window. Dekalb and Fulton counties are the customized spatial window for fetal births and deaths.

# create the owin class object and examine it with the code below
county_owin <-
  maptools::as.owin.SpatialPolygons(county) 

summary(county_owin)
## Window: polygonal boundary
## single connected closed polygon with 698 vertices
## enclosing rectangle: [1026366.3, 1098719.2] x [1220671, 1302019.3] units
##                      (72350 x 81350 units)
## Window area = 2086320000 square units
## Fraction of frame area: 0.354
plot(county_owin)

# create the birth ppp object
b_ppp <- ppp(x = st_coordinates(b_point)[, 1], 
             y = st_coordinates(b_point)[, 2],
             window = county_owin)

# create the death ppp object
d_ppp <- ppp(x = st_coordinates(d_point)[, 1], 
             y = st_coordinates(d_point)[, 2],
             window = county_owin)

# summarize and plot 
summary(d_ppp)
## Planar point pattern:  701 points
## Average intensity 3.359978e-07 points per square unit
## 
## Coordinates are given to 2 decimal places
## i.e. rounded to the nearest multiple of 0.01 units
## 
## Window: polygonal boundary
## single connected closed polygon with 698 vertices
## enclosing rectangle: [1026366.3, 1098719.2] x [1220671, 1302019.3] units
##                      (72350 x 81350 units)
## Window area = 2086320000 square units
## Fraction of frame area: 0.354
plot(d_ppp)

summary(b_ppp)
## Planar point pattern:  94373 points
## Average intensity 4.523412e-05 points per square unit
## 
## Coordinates are given to 2 decimal places
## i.e. rounded to the nearest multiple of 0.01 units
## 
## Window: polygonal boundary
## single connected closed polygon with 698 vertices
## enclosing rectangle: [1026366.3, 1098719.2] x [1220671, 1302019.3] units
##                      (72350 x 81350 units)
## Window area = 2086320000 square units
## Fraction of frame area: 0.354
plot(b_ppp)

The summary includes information about the overall spatial intensity, number of points, and observational window.

Bandwith Selection

Fixed bandwith: A single value if h designates that the width of the kernel (and thus the resulting smoothness of the estimated intensity surface) is the same for the entire study region. Fixed bandwidths are commonly used, but choosing a single value can be challenging in practice when the density of points varies substantially across the study region.

Adaptive bandwith: As the name implies, this approach changes the size of the kernel density bandwidth according to the density of points (data) in differing sub-areas of the overall study region. The result is is relatively more smoothing (larger bandwidth) in areas with sparse point data, and relatively less smoothing (smaller bandwidth) in areas with more point density.

Oversmoothing to identify mazimum amount of smoothing neccesary for minimizing statistical error. Use funtion OS().

# oversmoothing algorithm with function OS()
h_os_d <- OS(d_ppp)
h_os_b <- OS(b_ppp)

Fixed bandwith KDE

Use the oversmooth estimate of the upper bound for bandwith for each point process.

#fixed bandwith KDE with bivariate.density()
death_kde <- bivariate.density(d_ppp, h0 = h_os_d, edge = 'diggle') 
birth_kde <- bivariate.density(b_ppp, h0 = h_os_b, edge = 'diggle')

summary(birth_kde) 
## Bivariate Kernel Density/Intensity Estimate
## 
## Bandwidth
##   Fixed smoothing with h0 = 1897.371 units (to 4 d.p.)
## 
## No. of observations
##   94373 
## 
## Spatial bound
##   Type: polygonal
##   2D enclosure: [1026366, 1098719] x [1220671, 1302019]
## 
## Evaluation
##   128 x 128 rectangular grid
##   5808 grid cells out of 16384 fall inside study region
##   Density/intensity range [6.313906e-14, 1.585466e-09]
names(birth_kde)
## [1] "z"         "h0"        "hp"        "h"         "him"       "q"        
## [7] "gamma"     "geometric" "pp"
#quickly visualize resulting fixed bandwith KDE plot
par(mfrow = c(1, 2))
plot(birth_kde, main = 'Birth density') 
plot(death_kde, main = 'Death density')

par(mfrow = c(1,1))

Adaptive bandwith KDE

Implementation of algorithem that adapts the bandwith across study region in response to the density or sparseness of the data. Requires specification of global banthwith, and adaptation makes global smaller or larger as needed.

#adaptive bandwith KDE with bivariate.density()
death_kde_adapt <- bivariate.density(d_ppp,
                                     h0 = h_os_d,
                                     edge = 'diggle', 
                                     adapt = TRUE, 
                                     verbose = FALSE)
birth_kde_adapt <- bivariate.density(b_ppp,
                                     h0 = h_os_d,
                                     edge = 'diggle', 
                                     adapt = TRUE, 
                                     verbose = FALSE)


##quickly visualize resulting adaptive bandwith KDE plot
par(mfrow = c(1, 2))
plot(birth_kde_adapt, main = 'Birth density\n(adaptive h)') 
plot(death_kde_adapt, main = 'Death density\n(adaptive h)') 

par(mfrow = c(1,1))

Plotting KD estimates with tmap()

#convert image data to raster data
death_kde_raster <- raster(death_kde_adapt$z,
                           crs = "+init=epsg:5070")
birth_kde_raster <- raster(birth_kde_adapt$z,
                           crs = "+init=epsg:5070")

#create map of death surface
m1 <- tm_shape(death_kde_raster) + tm_raster(palette = 'BuPu',
                                             style = 'cont',
                                             title = 'Death density') + 
  tm_layout(legend.format = list(scientific = T))

#create map of brith surface
m2 <- tm_shape(birth_kde_raster) + 
  tm_raster(palette = 'BuPu',
            style = 'cont',
            title = 'Birth density') +
  tm_layout(legend.format = list(scientific = T))

tmap_arrange(m1,m2)

Creating relative risk surface manually

To manually create spatial relative risk surface, take ratio of KDE intesity surface which will give essentially an SMR (relative deviation of each area from overall average value). Values below 1 have lower than average risk, and values above 1 have higher average risk.

#create risk surface as ratio of death density to birth density
risk <- death_kde_raster / birth_kde_raster

#map SMR
tm_shape(risk) +
  tm_raster(palette = '-RdYlGn',
            style = 'cont',
            breaks = c(0.1, 0.6, 0.9, 1.1, 2, 4.9),
            title = 'IMR SMR')

Creating relative risk surface with risk() function

Sparr package provides shortcut for estimation of spatial relative risk surfaces in function risk(). Also illustrates another feature which allows for quantifying statistical precision by creating tolerance contours. Tolerance contours are simply lines which encircle regions that are statistically significant below a given threshold.

By default, the function estimates the log relative risk, which is a helpful reminder that the relative risk is asymmetric. However, we understand ratio measures, and will be careful to plot the results appropriately. For that reason, I set log = FALSE, although obviously you could omit that and keep everything on the log scale.

imr1000 <- risk(d_ppp, b_ppp, h0 = 1000, 
                tolerate = T,
                verbose = F,
                log = F,
                edge = 'diggle')
imr2000 <- risk(d_ppp, b_ppp, h0 = 2000, 
                tolerate = T,
                log = F,
                edge = 'diggle', 
                verbose = F)
imr4000 <- risk(d_ppp, b_ppp, h0 = 4000, 
                tolerate = T,
                log = F,
                edge = 'diggle', 
                verbose = F)
imr8000 <- risk(d_ppp, b_ppp, h0 = 8000, 
                tolerate = T,
                log = F,
                edge = 'diggle', 
                verbose = F)
imradapt <- risk(d_ppp, b_ppp, h0 = h_os_d,
                 adapt = T, 
                 tolerate = T, 
                 log = F,
                 edge = 'diggle', 
                 verbose = F)

summary(imr1000)
## Log-Relative Risk Function.
## 
## Estimated risk range [-1.967191e-13, 11.05946]
## 
## --Numerator (case) density--
## Bivariate Kernel Density/Intensity Estimate
## 
## Bandwidth
##   Fixed smoothing with h0 = 1000 units (to 4 d.p.)
## 
## No. of observations
##   701 
## 
## Spatial bound
##   Type: polygonal
##   2D enclosure: [1026366, 1098719] x [1220671, 1302019]
## 
## Evaluation
##   128 x 128 rectangular grid
##   5808 grid cells out of 16384 fall inside study region
##   Density/intensity range [-2.491416e-25, 2.78467e-09]
## 
## --Denominator (control) density--
## Bivariate Kernel Density/Intensity Estimate
## 
## Bandwidth
##   Fixed smoothing with h0 = 1000 units (to 4 d.p.)
## 
## No. of observations
##   94373 
## 
## Spatial bound
##   Type: polygonal
##   2D enclosure: [1026366, 1098719] x [1220671, 1302019]
## 
## Evaluation
##   128 x 128 rectangular grid
##   5808 grid cells out of 16384 fall inside study region
##   Density/intensity range [2.143127e-16, 2.383589e-09]
names(imr1000)
## [1] "rr" "f"  "g"  "P"
#visualize
par(mar = c(1,1,1,1)) 
par(mfrow = c(3,2)) 
plot(imr1000) 
plot(imr2000) 
plot(imr4000) 
plot(imr8000) 
plot(imradapt)
par(mfrow = c(1,1))

Using functions to map RR with tmap()

How to write simple custom functions in R (set of instructions that accepts inputs and carries out action on those inputs to return to some output) to spead up repetitive tasks.

This is a function that accepts a single argument, labeled simply x here. The expectation is that x should be the output of the above risk() function. Notice how the function first extracts the spatial relative risk surface (e.g. x$rr), and then assigns the appropriate projection (it got stripped off some where along the way). Then the function extracts the probability map which is the set of pixel-specific p-values. The rasterToContour() function takes this raster and creates contour lines with the specified levels corresponding to a 95% tolerance contour. Finally, the use of the return() tells what should be returned when the function is called

### make prepRaster() function
prepRaster <- function(x){
rr <- raster(x$rr,
             crs = "+init=epsg:5070")
p_raster <- raster(x$P,
                   crs = "+init=epsg:5070")
plines <- rasterToContour(p_raster, levels = c(0.025, 0.975))
return(list(rr=rr,plines=plines)) 
}

Write a function for producing a tmap() panel

### make map () function to create panel maps

make_map <- function(x, bw){
  mtitle <- paste('IMR - ', bw, ' smooth', sep = '') 
  tm_shape(x$rr) +
    tm_raster(palette = '-RdYlGn',
              style = 'cont',
              breaks = c(0.1, 0.6, 0.9, 1.1, 2, 4.9),
              midpoint = NA,
              title = 'IMR SMR') +
    tm_shape(x$plines) + 
    tm_lines(col = 'level',
             legend.col.show = F) + 
    tm_layout(main.title = mtitle,
              legend.format = list(digits = 1))
}

Map the four fixed-bandwith spatial relative risk surfaces using new map function

#convert to raster and extract p-value contours
rr_1000 <- prepRaster(imr1000) 
rr_2000 <- prepRaster(imr2000) 
rr_4000 <- prepRaster(imr4000) 
rr_8000 <- prepRaster(imr8000) 
rr_adapt <- prepRaster(imradapt)

#produce map panels
m1000 <- make_map(rr_1000, '1 km')
m2000 <- make_map(rr_2000, '2 km')
m4000 <- make_map(rr_4000, '4 km')
m8000 <- make_map(rr_8000, '8 km') 
mapadapt <- make_map(rr_adapt, 'adaptive')

tmap_arrange(m1000, m2000, m4000, m8000, mapadapt, ncol = 2)

Using function GWModel for exploratory spatial data analysis (spatial non-stationary) of ATL dataset

The function gwss() stands for geographically weighted summary statistics, and uses the non- parametric spatial weighting of a kernel density function to compute locally varying descriptive statistics such as the mean, median, standard deviation, correlation, and more. But any statistic can be stationary (constant) or non-stationary (spatially varying).

Mapping the observed values

# First map the 4 variables that are %
tm_shape(atl) +
tm_fill(c('pctNOHS', 'pctPOV', 'pctMOVE', 'pctOWNER_OCC'),
        style = 'quantile') + 
  tm_borders()

The Index of Concentration at the Extremes (ICE) ranges from -1 to +1. A value of −1 occurs where everyone in the tract is poor; a value of +1 occurs in tracts where everyone is affluent; a value of 0 suggests that either there is a balance of affluence and poverty, or alternatively that everyone is ‘middle income’. Therefore it makes sense to map it separately because it will inevitably need a divergent color ramp.

tm_shape(atl) + tm_fill('ICE_INCOME_all',
                        style = 'quantile',
                        palette = 'div') + 
  tm_borders()
## Variable(s) "ICE_INCOME_all" contains positive and negative values, so midpoint is set to 0. Set midpoint = NA to show the full spectrum of the color palette.

Using GWmodel() to estimate the ‘optimal’ bandwith for estimating the spatially varying mean or median using cross-validation.

#fixed bandwidth selection
bw.gwss.average(atl, vars = c('pctPOV', 'pctPOV',
                              'pctMOVE','pctOWNER_OCC',
                              'ICE_INCOME_all'))
##                   pctPOV   pctPOV  pctMOVE pctOWNER_OCC ICE_INCOME_all
## Local Mean bw   58801.33 58801.33 58801.33     65740.07       65740.07
## Local Median bw 65740.07 65740.07 58801.33     70028.45       65353.39
#adapative bandwidth selection

bw.gwss.average(atl, vars = c('pctPOV', 'pctNOHS',
                              'pctMOVE',
                              'pctOWNER_OCC',
                              'ICE_INCOME_all'),
                adaptive = T)
##                 pctPOV pctNOHS pctMOVE pctOWNER_OCC ICE_INCOME_all
## Local Mean bw      315     268     297          337            333
## Local Median bw    297     297     315          326            315

There are only 345 areal units in this dataset, and this suggests that nearly all of them should be included in the kernel. That would produce very little spatial variation. It would seem that we might balance local information and spatial variation by including no more than 10% of the data in any single kernel density estimation location. So we could choose to use n = 35 neighbors as the definition of our adaptive bandwidth.

Geographically weighted summary statistics using gwss()

atl.ss <- gwss(atl, vars = c('pctPOV', 'pctNOHS',
                             'pctMOVE', 'pctOWNER_OCC',
                             'ICE_INCOME_all'),
               adaptive = T, 
               bw = 35, 
               quantile = T)
## Warning in proj4string(data): CRS object has comment, which is lost in output

Summary gives you information for the range of smoothed values for each statistic and each variable.All of the estimates of the geographically weighted mean end with _LM which stands for local mean. Similarly the estimates of geographically weighted standard deviation end with _LSD for local SD.

Mapping gwas results

names(atl.ss)
##  [1] "SDF"      "vars"     "kernel"   "adaptive" "bw"       "p"       
##  [7] "theta"    "longlat"  "DM.given" "sp.given" "quantile"
summary(atl.ss)
##          Length Class                    Mode     
## SDF      345    SpatialPolygonsDataFrame S4       
## vars       5    -none-                   character
## kernel     1    -none-                   character
## adaptive   1    -none-                   logical  
## bw         1    -none-                   numeric  
## p          1    -none-                   numeric  
## theta      1    -none-                   numeric  
## longlat    1    -none-                   logical  
## DM.given   1    -none-                   logical  
## sp.given   1    -none-                   logical  
## quantile   1    -none-                   logical
summary(atl.ss$SDF)
## Object of class SpatialPolygonsDataFrame
## Coordinates:
##         min     max
## x 1026366.3 1098719
## y -386115.2 -304767
## Is projected: TRUE 
## proj4string :
## [+proj=aea +lat_0=23 +lon_0=-96 +lat_1=29.5 +lat_2=45.5 +x_0=0 +y_0=0
## +datum=NAD83 +units=m +no_defs]
## Data attributes:
##    pctPOV_LM         pctNOHS_LM        pctMOVE_LM     pctOWNER_OCC_LM 
##  Min.   :0.05065   Min.   :0.01134   Min.   :0.1292   Min.   :0.2142  
##  1st Qu.:0.11977   1st Qu.:0.03180   1st Qu.:0.1666   1st Qu.:0.4210  
##  Median :0.18084   Median :0.05198   Median :0.1857   Median :0.4869  
##  Mean   :0.19734   Mean   :0.05741   Mean   :0.1957   Mean   :0.4891  
##  3rd Qu.:0.27087   3rd Qu.:0.07904   3rd Qu.:0.2159   3rd Qu.:0.5683  
##  Max.   :0.39470   Max.   :0.13690   Max.   :0.3356   Max.   :0.7545  
##  ICE_INCOME_all_LM     pctPOV_LSD       pctNOHS_LSD       pctMOVE_LSD     
##  Min.   :-0.363659   Min.   :0.03047   Min.   :0.01302   Min.   :0.04340  
##  1st Qu.:-0.161687   1st Qu.:0.08096   1st Qu.:0.02774   1st Qu.:0.06214  
##  Median :-0.010174   Median :0.09503   Median :0.03853   Median :0.07097  
##  Mean   :-0.003091   Mean   :0.10274   Mean   :0.04617   Mean   :0.07694  
##  3rd Qu.: 0.180475   3rd Qu.:0.12299   3rd Qu.:0.05589   3rd Qu.:0.08552  
##  Max.   : 0.392141   Max.   :0.19999   Max.   :0.15118   Max.   :0.15943  
##  pctOWNER_OCC_LSD ICE_INCOME_all_LSD  pctPOV_LVar         pctNOHS_LVar      
##  Min.   :0.1285   Min.   :0.09873    Min.   :0.0009282   Min.   :0.0001695  
##  1st Qu.:0.1711   1st Qu.:0.15379    1st Qu.:0.0065551   1st Qu.:0.0007695  
##  Median :0.1930   Median :0.18624    Median :0.0090312   Median :0.0014845  
##  Mean   :0.2030   Mean   :0.18344    Mean   :0.0116555   Mean   :0.0028788  
##  3rd Qu.:0.2360   3rd Qu.:0.21418    3rd Qu.:0.0151262   3rd Qu.:0.0031239  
##  Max.   :0.3127   Max.   :0.29488    Max.   :0.0399940   Max.   :0.0228544  
##   pctMOVE_LVar      pctOWNER_OCC_LVar ICE_INCOME_all_LVar  pctPOV_LSKe     
##  Min.   :0.001884   Min.   :0.01650   Min.   :0.009747    Min.   :-0.6030  
##  1st Qu.:0.003862   1st Qu.:0.02929   1st Qu.:0.023652    1st Qu.: 0.4015  
##  Median :0.005037   Median :0.03727   Median :0.034685    Median : 0.8876  
##  Mean   :0.006367   Mean   :0.04293   Mean   :0.035527    Mean   : 1.0405  
##  3rd Qu.:0.007314   3rd Qu.:0.05572   3rd Qu.:0.045875    3rd Qu.: 1.5071  
##  Max.   :0.025418   Max.   :0.09779   Max.   :0.086957    Max.   : 4.5718  
##   pctNOHS_LSKe      pctMOVE_LSKe     pctOWNER_OCC_LSKe ICE_INCOME_all_LSKe
##  Min.   :-0.2635   Min.   :-0.4316   Min.   :-1.2109   Min.   :-1.0305    
##  1st Qu.: 0.8059   1st Qu.: 0.2789   1st Qu.:-0.4108   1st Qu.:-0.1805    
##  Median : 1.3878   Median : 0.5831   Median :-0.1053   Median : 0.1766    
##  Mean   : 1.7775   Mean   : 0.6647   Mean   :-0.1088   Mean   : 0.1911    
##  3rd Qu.: 2.4145   3rd Qu.: 0.9743   3rd Qu.: 0.1741   3rd Qu.: 0.5539    
##  Max.   : 7.7612   Max.   : 2.6409   Max.   : 1.1014   Max.   : 1.8477    
##    pctPOV_LCV      pctNOHS_LCV      pctMOVE_LCV     pctOWNER_OCC_LCV
##  Min.   :0.2128   Min.   :0.3199   Min.   :0.2346   Min.   :0.2011  
##  1st Qu.:0.4155   1st Qu.:0.4924   1st Qu.:0.3534   1st Qu.:0.3505  
##  Median :0.5774   Median :0.8506   Median :0.3874   Median :0.4275  
##  Mean   :0.5950   Mean   :0.9173   Mean   :0.3945   Mean   :0.4331  
##  3rd Qu.:0.7349   3rd Qu.:1.3048   3rd Qu.:0.4300   3rd Qu.:0.5156  
##  Max.   :1.1784   Max.   :2.2432   Max.   :0.6638   Max.   :0.7493  
##  ICE_INCOME_all_LCV  Cov_pctPOV.pctNOHS   Cov_pctPOV.pctMOVE  
##  Min.   :-127.7859   Min.   :-0.0022931   Min.   :-0.0029894  
##  1st Qu.:  -1.0292   1st Qu.: 0.0009012   1st Qu.: 0.0009079  
##  Median :  -0.3521   Median : 0.0018075   Median : 0.0022298  
##  Mean   :  -0.4957   Mean   : 0.0027749   Mean   : 0.0027372  
##  3rd Qu.:   1.0390   3rd Qu.: 0.0033044   3rd Qu.: 0.0044672  
##  Max.   :  94.7817   Max.   : 0.0180077   Max.   : 0.0098743  
##  Cov_pctPOV.pctOWNER_OCC Cov_pctPOV.ICE_INCOME_all Cov_pctNOHS.pctMOVE 
##  Min.   :-0.039120       Min.   :-0.054587         Min.   :-6.067e-03  
##  1st Qu.:-0.020229       1st Qu.:-0.021778         1st Qu.: 3.601e-05  
##  Median :-0.014517       Median :-0.014607         Median : 3.318e-04  
##  Mean   :-0.015524       Mean   :-0.016514         Mean   : 3.415e-04  
##  3rd Qu.:-0.009564       3rd Qu.:-0.009578         3rd Qu.: 7.484e-04  
##  Max.   :-0.001562       Max.   :-0.002345         Max.   : 3.823e-03  
##  Cov_pctNOHS.pctOWNER_OCC Cov_pctNOHS.ICE_INCOME_all Cov_pctMOVE.pctOWNER_OCC
##  Min.   :-0.028576        Min.   :-0.027722          Min.   :-0.026759       
##  1st Qu.:-0.005198        1st Qu.:-0.005411          1st Qu.:-0.012237       
##  Median :-0.002497        Median :-0.002790          Median :-0.008393       
##  Mean   :-0.004441        Mean   :-0.004505          Mean   :-0.008884       
##  3rd Qu.:-0.001219        3rd Qu.:-0.001574          3rd Qu.:-0.005566       
##  Max.   : 0.006500        Max.   : 0.003936          Max.   : 0.005566       
##  Cov_pctMOVE.ICE_INCOME_all Cov_pctOWNER_OCC.ICE_INCOME_all Corr_pctPOV.pctNOHS
##  Min.   :-0.017867          Min.   :0.007196                Min.   :-0.3170    
##  1st Qu.:-0.008468          1st Qu.:0.020528                1st Qu.: 0.3302    
##  Median :-0.005404          Median :0.031021                Median : 0.5284    
##  Mean   :-0.004998          Mean   :0.032966                Mean   : 0.5006    
##  3rd Qu.:-0.002219          3rd Qu.:0.044465                3rd Qu.: 0.7092    
##  Max.   : 0.014909          Max.   :0.068909                Max.   : 0.9313    
##  Corr_pctPOV.pctMOVE Corr_pctPOV.pctOWNER_OCC Corr_pctPOV.ICE_INCOME_all
##  Min.   :-0.3313     Min.   :-0.9260          Min.   :-0.9212           
##  1st Qu.: 0.1577     1st Qu.:-0.7682          1st Qu.:-0.8552           
##  Median : 0.3367     Median :-0.7020          Median :-0.8058           
##  Mean   : 0.3220     Mean   :-0.6828          Mean   :-0.7813           
##  3rd Qu.: 0.5194     3rd Qu.:-0.6048          3rd Qu.:-0.7381           
##  Max.   : 0.7694     Max.   :-0.3121          Max.   :-0.2913           
##  Corr_pctNOHS.pctMOVE Corr_pctNOHS.pctOWNER_OCC Corr_pctNOHS.ICE_INCOME_all
##  Min.   :-0.555197    Min.   :-0.7423           Min.   :-0.7935            
##  1st Qu.: 0.009498    1st Qu.:-0.4959           1st Qu.:-0.5456            
##  Median : 0.134766    Median :-0.3802           Median :-0.4470            
##  Mean   : 0.145947    Mean   :-0.3462           Mean   :-0.4219            
##  3rd Qu.: 0.300808    3rd Qu.:-0.2283           3rd Qu.:-0.3231            
##  Max.   : 0.683280    Max.   : 0.4213           Max.   : 0.3913            
##  Corr_pctMOVE.pctOWNER_OCC Corr_pctMOVE.ICE_INCOME_all
##  Min.   :-0.8912           Min.   :-0.8172            
##  1st Qu.:-0.7469           1st Qu.:-0.5948            
##  Median :-0.6134           Median :-0.4580            
##  Mean   :-0.5470           Mean   :-0.3573            
##  3rd Qu.:-0.4331           3rd Qu.:-0.1890            
##  Max.   : 0.2776           Max.   : 0.5593            
##  Corr_pctOWNER_OCC.ICE_INCOME_all Spearman_rho_pctPOV.pctNOHS
##  Min.   :0.4761                   Min.   :-0.3137            
##  1st Qu.:0.7364                   1st Qu.: 0.3999            
##  Median :0.8139                   Median : 0.5570            
##  Mean   :0.7983                   Mean   : 0.5370            
##  3rd Qu.:0.8681                   3rd Qu.: 0.7045            
##  Max.   :0.9618                   Max.   : 0.9311            
##  Spearman_rho_pctPOV.pctMOVE Spearman_rho_pctPOV.pctOWNER_OCC
##  Min.   :-0.3390             Min.   :-0.9327                 
##  1st Qu.: 0.2155             1st Qu.:-0.7724                 
##  Median : 0.3742             Median :-0.6998                 
##  Mean   : 0.3469             Mean   :-0.6871                 
##  3rd Qu.: 0.5295             3rd Qu.:-0.6212                 
##  Max.   : 0.7608             Max.   :-0.3108                 
##  Spearman_rho_pctPOV.ICE_INCOME_all Spearman_rho_pctNOHS.pctMOVE
##  Min.   :-0.9457                    Min.   :-0.40721            
##  1st Qu.:-0.8764                    1st Qu.: 0.05727            
##  Median :-0.8393                    Median : 0.17412            
##  Mean   :-0.8093                    Mean   : 0.16972            
##  3rd Qu.:-0.7747                    3rd Qu.: 0.30311            
##  Max.   :-0.3117                    Max.   : 0.58534            
##  Spearman_rho_pctNOHS.pctOWNER_OCC Spearman_rho_pctNOHS.ICE_INCOME_all
##  Min.   :-0.8288                   Min.   :-0.8405                    
##  1st Qu.:-0.5518                   1st Qu.:-0.6476                    
##  Median :-0.4284                   Median :-0.5250                    
##  Mean   :-0.3868                   Mean   :-0.5064                    
##  3rd Qu.:-0.2455                   3rd Qu.:-0.3814                    
##  Max.   : 0.2446                   Max.   : 0.1699                    
##  Spearman_rho_pctMOVE.pctOWNER_OCC Spearman_rho_pctMOVE.ICE_INCOME_all
##  Min.   :-0.8851                   Min.   :-0.8330                    
##  1st Qu.:-0.7314                   1st Qu.:-0.5600                    
##  Median :-0.6195                   Median :-0.4284                    
##  Mean   :-0.5577                   Mean   :-0.3450                    
##  3rd Qu.:-0.4501                   3rd Qu.:-0.1736                    
##  Max.   : 0.3082                   Max.   : 0.4005                    
##  Spearman_rho_pctOWNER_OCC.ICE_INCOME_all pctPOV_Median     pctNOHS_Median    
##  Min.   :0.4549                           Min.   :0.04247   Min.   :0.003663  
##  1st Qu.:0.7123                           1st Qu.:0.09031   1st Qu.:0.014632  
##  Median :0.7973                           Median :0.16100   Median :0.042506  
##  Mean   :0.7746                           Mean   :0.17781   Mean   :0.043469  
##  3rd Qu.:0.8524                           3rd Qu.:0.25192   3rd Qu.:0.068828  
##  Max.   :0.9477                           Max.   :0.39190   Max.   :0.105287  
##  pctMOVE_Median    pctOWNER_OCC_Median ICE_INCOME_all_Median   pctPOV_IQR     
##  Min.   :0.09583   Min.   :0.1919      Min.   :-0.38411      Min.   :0.02725  
##  1st Qu.:0.15863   1st Qu.:0.3920      1st Qu.:-0.19501      1st Qu.:0.08469  
##  Median :0.18618   Median :0.4902      Median :-0.02393      Median :0.11507  
##  Mean   :0.18653   Mean   :0.4950      Mean   :-0.01409      Mean   :0.12983  
##  3rd Qu.:0.20178   3rd Qu.:0.6023      3rd Qu.: 0.17007      3rd Qu.:0.17100  
##  Max.   :0.31080   Max.   :0.8389      Max.   : 0.38298      Max.   :0.33702  
##   pctNOHS_IQR        pctMOVE_IQR      pctOWNER_OCC_IQR  ICE_INCOME_all_IQR
##  Min.   :0.007199   Min.   :0.04923   Min.   :0.06952   Min.   :0.07414   
##  1st Qu.:0.023346   1st Qu.:0.07772   1st Qu.:0.24067   1st Qu.:0.18532   
##  Median :0.036208   Median :0.09437   Median :0.29305   Median :0.23079   
##  Mean   :0.048902   Mean   :0.09872   Mean   :0.31103   Mean   :0.24477   
##  3rd Qu.:0.056780   3rd Qu.:0.11432   3rd Qu.:0.36347   3rd Qu.:0.29328   
##  Max.   :0.196135   Max.   :0.21948   Max.   :0.66631   Max.   :0.55967   
##    pctPOV_QI          pctNOHS_QI          pctMOVE_QI       pctOWNER_OCC_QI  
##  Min.   :-0.84239   Min.   :-0.937694   Min.   :-0.71527   Min.   :-0.7647  
##  1st Qu.:-0.35758   1st Qu.:-0.516389   1st Qu.:-0.28486   1st Qu.:-0.1424  
##  Median :-0.16139   Median :-0.260145   Median :-0.07138   Median : 0.0615  
##  Mean   :-0.12672   Mean   :-0.243326   Mean   :-0.05734   Mean   : 0.0391  
##  3rd Qu.: 0.08367   3rd Qu.: 0.008415   3rd Qu.: 0.14546   3rd Qu.: 0.2285  
##  Max.   : 0.74983   Max.   : 0.496576   Max.   : 0.70005   Max.   : 0.7302  
##  ICE_INCOME_all_QI 
##  Min.   :-0.86704  
##  1st Qu.:-0.23682  
##  Median :-0.03107  
##  Mean   :-0.02864  
##  3rd Qu.: 0.16363  
##  Max.   : 0.88027

There are many sub-objects within the main result object. But the first one, always called SDF has class SpatialPolygonsDataFrame. The is basically the sp version of a polygon spatial file. Examining it more closely ee that it has the information we need to make maps (e.g. it is a spatial object with attribute data).

#map geographically-weighted median

tm_shape(atl.ss$SDF) +
tm_fill(c('pctPOV_Median',
          'pctNOHS_Median', 
          'pctMOVE_Median', 
          'pctOWNER_OCC_Median'),
        style = 'quantile') +
  tm_borders()

#map geographically-weighted IQR
tm_shape(atl.ss$SDF) +
tm_fill(c('pctPOV_IQR',
          'pctNOHS_IQR', 
          'pctMOVE_IQR',
          'pctOWNER_OCC_IQR'),
        style = 'quantile') + 
  tm_borders()

Calculating pseudo p-values for smoothed estimates

Using Monte Carlo permutation testing. The idea with permutation testing, is empirically simulating what the data would look like under the null distribution. Then compare single instance of observed data to the simulated null distribution to describe how unusual the observations are, given what would have been expected due to chance alone.

The function follows a 3-step process. First it will randomly reassign the location of variables of interest n−times (where n specified by user, but typically reasonably large) Second, for each random permutation of the random variable, the summary statistic (e.g. the spatially-weighted local mean of variable) is calculated. Finally, the observed results is compared to the null distribution. If we calculated n = 999 random permutations, then we would have n = 1000 versions of the summarized statistic, including the observed. The pseudo p-value is calculated as the number of times for each spatial unit that the observed value is more extreme than what would be expected under the null. For example if we defined extreme as being something that happens fewer than 5% of the time by random chance alone, then we might classify our observed value as extreme (and thus significant) if our observed value was either less than the lower 2.5% of the null values, or greater than the upper 97.5% of the null values.

p.val <- gwss.montecarlo(atl, vars = c('pctPOV',
                                       'pctMOVE'),
                         adaptive = T,
                         bw = 35,
                         nsim = 499)
## Warning in proj4string(data): CRS object has comment, which is lost in output
summary(p.val)
##    pctPOV_LM        pctMOVE_LM       pctPOV_LSD      pctMOVE_LSD    
##  Min.   :0.0020   Min.   :0.0020   Min.   :0.0040   Min.   :0.0020  
##  1st Qu.:0.2600   1st Qu.:0.2540   1st Qu.:0.2500   1st Qu.:0.2480  
##  Median :0.5060   Median :0.4880   Median :0.5080   Median :0.4960  
##  Mean   :0.5007   Mean   :0.5009   Mean   :0.5008   Mean   :0.5005  
##  3rd Qu.:0.7560   3rd Qu.:0.7420   3rd Qu.:0.7500   3rd Qu.:0.7500  
##  Max.   :0.9980   Max.   :0.9980   Max.   :0.9980   Max.   :0.9980  
##   pctPOV_LVar      pctMOVE_LVar     pctPOV_LSKe     pctMOVE_LSKe   
##  Min.   :0.0040   Min.   :0.0020   Min.   :0.002   Min.   :0.0040  
##  1st Qu.:0.2500   1st Qu.:0.2480   1st Qu.:0.248   1st Qu.:0.2580  
##  Median :0.5080   Median :0.4960   Median :0.500   Median :0.4880  
##  Mean   :0.5008   Mean   :0.5005   Mean   :0.501   Mean   :0.5006  
##  3rd Qu.:0.7500   3rd Qu.:0.7500   3rd Qu.:0.750   3rd Qu.:0.7480  
##  Max.   :0.9980   Max.   :0.9980   Max.   :0.998   Max.   :0.9980  
##    pctPOV_LCV      pctMOVE_LCV     Cov_pctPOV.pctMOVE Corr_pctPOV.pctMOVE
##  Min.   :0.0020   Min.   :0.0040   Min.   :0.0060     Min.   :0.002      
##  1st Qu.:0.2420   1st Qu.:0.2480   1st Qu.:0.2520     1st Qu.:0.248      
##  Median :0.5080   Median :0.5080   Median :0.5060     Median :0.510      
##  Mean   :0.4997   Mean   :0.5011   Mean   :0.5001     Mean   :0.500      
##  3rd Qu.:0.7460   3rd Qu.:0.7440   3rd Qu.:0.7560     3rd Qu.:0.754      
##  Max.   :0.9980   Max.   :0.9980   Max.   :0.9980     Max.   :0.998      
##  Spearman_rho_pctPOV.pctMOVE
##  Min.   :0.0020             
##  1st Qu.:0.2520             
##  Median :0.5020             
##  Mean   :0.5007             
##  3rd Qu.:0.7480             
##  Max.   :0.9980

Converting into a map to visually inspect the spatial variation in the geographically-weighted mean of pctPOV.

# First, create TRUE/FALSE vectors testing whether column 1 (pctPOV_LM) is extreme # I am using 2 significance levels: 90% and 95%
sig95 <- p.val[, 1] < 0.025 | p.val[, 1] > 0.975
sig90 <- p.val[, 1] < 0.05 | p.val[, 1] > 0.95

# Second create a spatial object that ONLY contains significant tracts
atl.sig95 <- atl[sig95, ] %>%
  aggregate(dissolve = T, FUN = mean) # this is sp code to merge adjacent tracts
## Loading required namespace: rgeos
atl.sig90 <- atl[sig90, ] %>% 
  aggregate(disolve = T, FUN = mean)
tm_shape(atl.ss$SDF) + 
  tm_fill('pctPOV_LM',
          style = 'quantile',
          title = 'Local Average % Poverty') +
  tm_borders() +
  tm_shape(atl.sig90) +
  tm_borders(lwd = 2, col = 'blue') +
  tm_shape(atl.sig95) + 
  tm_borders(lwd = 2, col ='red') +
  tm_add_legend(type = 'line',
                labels = c('p < 0.05', 'p < 0.10'),
                col = c('red', 'blue'))

Suggests that there is a great deal of spatial heterogeneity. However, the permutation test suggests that only a few regions in far North Fulton and in West Atlanta have values that are more extreme than expected under an assumption of spatial independence.

Estimating geographically-weighted bivariate statistics

tm_shape(atl.ss$SDF) +
  tm_fill(c('Spearman_rho_pctPOV.pctNOHS',
            'Spearman_rho_pctPOV.pctMOVE',
            'Spearman_rho_pctPOV.ICE_INCOME_all'),
          style = 'quantile') +
  tm_borders()
## Variable(s) "Spearman_rho_pctPOV.pctNOHS" contains positive and negative values, so midpoint is set to 0. Set midpoint = NA to show the full spectrum of the color palette.
## Variable(s) "Spearman_rho_pctPOV.pctMOVE" contains positive and negative values, so midpoint is set to 0. Set midpoint = NA to show the full spectrum of the color palette.

Examining whether differences are more extreme than expected under a null hypothesis of spatial independence and spatial stationarity.

p.val <- gwss.montecarlo(atl, 
                         vars = c('pctPOV',
                                  'pctMOVE'),
                         adaptive = T,
                         bw = 35, 
                         nsim = 499)
## Warning in proj4string(data): CRS object has comment, which is lost in output
# First, create TRUE/FALSE vectors testing whether column 1 (pctPOV_LM) is extreme # I am using 2 significance levels: 90% and 95%
sig95 <- p.val[, 13] < 0.025 | p.val[, 13] > 0.975
sig90 <- p.val[, 13] < 0.05 | p.val[, 13] > 0.95

# Second create a spatial object that ONLY contains significant tracts
atl.sig95 <- atl[sig95, ] %>% 
  aggregate(dissolve = T, FUN = mean)
atl.sig90 <- atl[sig90, ] %>% 
  aggregate(disolve = T, FUN = mean)

Map correlation between pctPOV and pctMOVE alon with the significance test

tm_shape(atl.ss$SDF) +
  tm_fill('Spearman_rho_pctPOV.pctMOVE',
          style = 'quantile',
          title = 'Local correlation Poverty\n&
          Residential instability') + 
  tm_borders() +
  tm_shape(atl.sig90) +
  tm_borders(lwd = 2, col = 'blue') +
  tm_shape(atl.sig95) + 
  tm_borders(lwd = 2, col ='red') +
  tm_add_legend(type = 'line',
                labels = c('p < 0.05', 'p < 0.10'),
                col = c('red', 'blue'))
## Variable(s) "Spearman_rho_pctPOV.pctMOVE" contains positive and negative values, so midpoint is set to 0. Set midpoint = NA to show the full spectrum of the color palette.